NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Data Structures for Data-Intensive Applications: Tradeoffs and Design Guidelines

https://doi.org/10.1561/1900000059

Athanassoulis, Manos; Idreos, Stratos; Shasha, Dennis (July 2023, Foundations and Trends® in Databases)

Full Text Available
Diversity, Equity and Inclusion Activities in Database Conferences: A 2023 Report

https://doi.org/10.1145/3685980.3685996

Amer-Yahia, Sihem; Agrawal, Divyakant; Amsterdamer, Yael; Bhowmick, Sourav S; Borovica-Gajic, Renata; Camacho-Rodríguez, Jesús; Cao, Jinli; Catania, Barbara; Chrysanthis, Panos K; Curino, Carlo; et al (July 2024, ACM SIGMOD Record)

The Diversity, Equity and Inclusion (DEI) initiative started as the Diversity/Inclusion initiative in 2020 [4]. The current report summarizes our activities in 2023.
more » « less
Full Text Available
The case for distributed shared-memory databases with RDMA-enabled memory disaggregation

https://doi.org/10.14778/3561261.3561263

Wang, Ruihong; Wang, Jianguo; Idreos, Stratos; Özsu, M. Tamer; Aref, Walid G. (September 2022, Proceedings of the VLDB Endowment)

Memory disaggregation (MD) allows for scalable and elastic data center design by separating compute (CPU) from memory. With MD, compute and memory are no longer coupled into the same server box. Instead, they are connected to each other via ultra-fast networking such as RDMA. MD can bring many advantages, e.g., higher memory utilization, better independent scaling (of compute and memory), and lower cost of ownership. This paper makes the case that MD can fuel the next wave of innovation on database systems. We observe that MD revives the great debate of shared what in the database community. We envision thatdistributed shared-memory databases (DSM-DB, for short)- that have not received much attention before - can be promising in the future with MD. We present a list of challenges and opportunities that can inspire next steps in system design making the case for DSM-DB.
more » « less
Full Text Available
Proteus: A Self-Designing Range Filter

https://doi.org/10.1145/3514221.3526167

Knorr, Eric R.; Lemaire, Baptiste; Lim, Andrew; Luo, Siqiang; Zhang, Huanchen; Idreos, Stratos; Mitzenmacher, Michael (June 2022, SIGMOD 2022)

Full Text Available
SNARF: a learning-enhanced range filter

https://doi.org/10.14778/3529337.3529347

Vaidya, Kapil; Chatterjee, Subarna; Knorr, Eric; Mitzenmacher, Michael; Idreos, Stratos; Kraska, Tim (April 2022, Proceedings of the VLDB Endowment)

We present Sparse Numerical Array-Based Range Filters (SNARF), a learned range filter that efficiently supports range queries for numerical data. SNARF creates a model of the data distribution to map the keys into a bit array which is stored in a compressed form. The model along with the compressed bit array which constitutes SNARF are used to answer membership queries. We evaluate SNARF on multiple synthetic and real-world datasets as a stand-alone filter and by integrating it into RocksDB. For range queries, SNARF provides up to 50x better false positive rate than state-of-the-art range filters, such as SuRF and Rosetta, with the same space usage. We also evaluate SNARF in RocksDB as a filter replacement for filtering requests before they access on-disk data structures. For RocksDB, SNARF can improve the execution time of the system up to 10x compared to SuRF and Rosetta for certain read-only workloads.
more » « less
Full Text Available
Optimal Column Layout for Hybrid Workloads

https://doi.org/10.14778/3358701.3358707

Athanassoulis, Manos; Bøgh, Kenneth; Idreos, Stratos (September 2019, Proceedings of the VLDB Endowment)

Data-intensive analytical applications need to support both efficient reads and writes. However, what is usually a good data layout for an update-heavy workload, is not well-suited for a read-mostly one and vice versa. Modern analytical data systems rely on columnar layouts and employ delta stores to inject new data and updates. We show that for hybrid workloads we can achieve close to one order of magnitude better performance by tailoring the column layout design to the data and query workload. Our approach navigates the possible design space of the physical layout: it organizes each column’s data by determining the number of partitions, their corresponding sizes and ranges, and the amount of buffer space and how it is allocated. We frame these design decisions as an optimization problem that, given workload knowledge and performance requirements, provides an optimal physical layout for the workload at hand. To evaluate this work, we build an in-memory storage engine, Casper, and we show that it outperforms state-of-the-art data layouts of analytical systems for hybrid workloads. Casper delivers up to 2.32x higher throughput for update-intensive workloads and up to 2.14x higher throughput for hybrid workloads. We further show how to make data layout decisions robust to workload variation by carefully selecting the input of the optimization.
more » « less
Full Text Available
Small Data

https://doi.org/10.1109/ICDE.2017.216

Kennedy, Oliver; Hipp, D. Richard; Idreos, Stratos; Marian, Amelie; Nandi, Arnab; Troncoso, Carmela; Wu, Eugene (April 2017, IEEE 33rd International Conference on Data Engineering (ICDE))

Data is becoming increasingly personal. Individuals regularly interact with a wide variety of structured data, from SQLite databases on phones, to HR spreadsheets, to personal sensors, to open government data appearing in news articles. Although these workloads are important, many of the classical challenges associated with scale and Big Data do not apply. This panel brings together experts in a variety of fields to explore the new opportunities and challenges presented by "Small Data".
more » « less
Full Text Available

Search for: All records